Accuracy of quantum-state estimation utilizing Akaike's information criterion 
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We report our theoretical and experimental investigations into errors in quantum state estimation, 
putting a special emphasis on their asymptotic behavior. Tomographic measurements and maximum 
likelihood estimation are used for estimating several kinds of identically prepared quantum states (bi- 
photon polarization states) produced via spontaneous parametric down-conversion. Excess errors 
in the estimation procedures are eliminated by introducing a new estimation strategy utilizing 
Akaike's information criterion. We make a quantitative comparision between the errors of the 
experimentally estimated states and their asymptotic lower bounds, which are derived from the 
Cramer-Rao inequality. Our results reveal influence of entanglement on the errors in the estimation. 
An alternative measurement strategy employing inseparable measurements is also discussed, and its 
performance is numerically explored. 

PACS numbers: 03.67.-a, 42.50.-p, 89.70.+C 
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I. INTRODUCTION 



One of the central features of quantum mechanics is 
that it does not allow to simultaneously obtain whole in- 
formation about an individual quantum system without 
errors [IJ. The Holevo bound on the accessible informa- 
tion and the no- cloning theorem are the prominent man- 
ifestations of the restrictions on acquiring information 
from quantum systems 0, and these restrictions culmi- 
nate in quantum cryptography 0. 

However, there are no obstacles to estimate all aspects 
of quantum states in a series of distinct measurements on 
identically prepared particles by quantum state tomogra- 
phy |3j lil ■ The pioneering experimental demonstration 
of this method has been accomplished by Smithey, et 
al. 0. They determined a Wigner function for vacuum 
and pulsed squeezed-vacuum state of a spatial-temporal 
mode using homodyne tomography. Schiller, et al. |5j 
applied this method to a estimation of a density matrix 
(in the number state representation) for squeezed vacuum 
state of two spectral components. In this experiment, the 
spectacular even-odd oscillations in the photon-number 
distribution was observed. Recently, Lvovsky et al. |6( 
and Bertet et al. have respectively succeeded in re- 
constructing a Wigner function for single-photon Fock 
state of a travelling spatial-temporal mode and that of 
a intra-cavity mode. Both estimated Wingner functions 
showed a dip reaching classically-impossible negative val- 
ues around the origin of the phase space. For the polar- 
ization degree of freedom of electromagnetic field, White 
et al. H used quantum state tomography, for the first 
time, to characterize non-maximally entangled states 
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produced from a spontaneous-down-conversion photon 
source. Kwiat et al. utilized this method for the veri- 
fication of decoherence-free characteristic of a particular 
entangled state 0, and for the demonstration of hidden 
non-locality of entangled mixed states 01 • 

In spite of these splendid experimental achievement 
with quantum state tomography, statistical errors in es- 
timating quantum states have been paid minor attention 
so far. Statistical analyses of errors in quantum-state es- 
timation should not be undervalued. Since any outcomes 
of measurements arc represented as a random variable in 
quantum mechanics, statistical analyses of their errors 
may reveal profound rule for acquiring information from 
quantum system. Moreover, such analyses may also lead 
to the development of quantum information technology, 
which requires us to faithfully prepare several kinds of 
quantum states 0] , and to the improvement of the sensi- 
tivity for various kinds of precision measurements, which 
is limited by quantum noises [Til IT^ . 

In this article, we report our theoretical and exper- 
imental analyses of errors in quantum state estimation 
putting a special emphasis on their asymptotic behav- 
ior. In particular we focus on the estimation of the state 
of two qubits (two 2-level quantum systems). The two- 
qubit system in 4-dimcnsional Hilbcrt space is the sim- 
plest one where the peculiar characteristic of quantum 
mechanics, entanglement, is activated. Since entangle- 
ment plays the critical role in the mysterious phenom- 
ena in the quantum world 0, Q, it is interest- 
ing to ask whether entanglement affects accuracy of the 
estimation. Various kinds of two qubits (including en- 
tangled states) are practically realizable as polarization 
states of bi-photon produced via spatially- nondegenerate, 
type-I spontaneous parametric down-conversion (SPDC) 
U H Efl 13 111 13 111 The procedure to estimate the 
state of two qubits has been well established by James, 



Kwiat, Munro and White ^t|- Thus, in our experiments, 
we followed the above methods for producing the ensem- 
bles of the bi-photon polarization states, for measuring 
them, and for estimating their density matrices. 

The main purpose of this article is to quantitatively 
show the limit on accuracy of quantum-state estimation. 
We demonstrate that the accuracy depends on state to be 
estimated and also measurement strategy. In order to do 
that, we introduce a new strategy of quantum-state esti- 
mation utilizing Akaike's information criterion (AIC) [2(j 
for eliminating numerical problem in the estimation pro- 
cedures especially in estimating (nearly) pure quantum 
states. While number of parameters used for character- 
izing density matrices of quantum states is fixed in the 
conventional estimation strategies 0, 0, l^] , the num- 
ber is varied in the new strategy for eliminating redun- 
dant parameters. Consequently, we can quantitatively 
compare experimentally-evaluated errors in the estima- 
tion with their asymptotic lower bound derived from the 
Cramer- Rao inequality without bothering about the deli- 
cate numerical problem accompanying the redundant pa- 
rameters. It is shown that the errors of the experimental 
results nearly achieve their lower bounds for all quantum 
states we examined. Moreover, owing to the reduction of 
the parameters, the AIC based new estimation strategy 
makes the lower bounds slightly decreased. 

Our results reveal that when measurements are per- 
formed locally (i.e., separately) on each qubit, existence 
of entanglement may degrade the accuracy of estimation. 
Thus, while the measurements in our experiments are lo- 
cal ones, we numerically examine the performance of an 
alternative measurement strategy, which includes insep- 
arable measurements on two qubits. 

The remainder of the article is organized as follows. In 
Sec. [HJ we show our experimental analyses of errors in 
estimating density matrices as a function of the ensem- 
ble size, i.e., as varying data acquisition time. In Sec. IIIII 
we present a prescription for calculating the asymptotic 
lower bounds on the errors in terms of fidelity and show 
that in the asymptotic region, the errors should be de- 
creasing as inversely proportional to the ensemble size. 
Then we compare the lower bounds with the experimen- 
tal results. In Sec. II VI a new strategy of quantum state 
estimation utilizing Akaike's information criterion is in- 
troduced, and the accuracy of the state estimated by 
this new strategy is presented. In Sec. [V] the alterna- 
tive measurement strategy for two qubits, which employs 
inseparable measurements, is numerically explored. Sec- 
tion IVII summarizes this article. In the Appendix, we 
briefly review tomographic measurements and maximum 
likelihood estimation for estimating two qubits, and de- 
rive the Cramer-Rao lower bound on the errors in the 
estimation. 
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FIG. 1: Experimental setup for producing various polariza- 
tion states of bi-photon and measuring them. 



II. EXPERIMENT 
A. experimental setup 

For experimentally producing various quantum states 
of two qubits, we use the method to create the 
various polarization states of bi-photon via spatially- 
nondegenerate, type-I spontaneous parametric down- 
conversion (SPDC). The method was invented by Kwiat 
et al. and applied to the various experiments 

nil mo mm. 

A rough sketch of the experimental setup is shown 
in Fig. n Two thin (0.13mm) beta-barium borate (/?- 
BaB2C>4, BBO) crystals, which are cut for satisfying 
the type-I phase matching, are adjacent so that their 
optical axes lie in the planes perpendicular to each 
other. Inside the crystals, the third harmonic beam 
(wave length: 266nm, average power: 190mW) of the 
mode-locked TkSapphire laser (pulse duration: 80fs, rep- 
etition rate: 82MHz) -we will call it pump beam- is 
slightly converted into the frequency-degenerate, but 
spatially- nondegenerate (opening angle: 3°) bi-photon 
(wave length: 532nm) via SPDC. This configuration of 
the setup makes it possible to produce various polariza- 
tion states of bi-photon (including entangled states) by 
adjusting the pump beam polarization with a half-wave 
plate (HWP) [3, |l6j > by modifying the relative time de- 
lay between the horizontal and vertical components of 
the pump beam with a pre- compensator (which consists 
of quartz plates and a variable wave plate (WP)) [l9| . 
and by inserting decoherers (two de-polarizers) into one 
of the paths of the down-converted photons [ii UIJ, , as 
shown in Fig. ^ We produced three particular quantum 
states, the very noisy mixed state (VNMS), the almost 
pure and separable state (APSS), and the highly entan- 
gled state (HES) for inspecting influence of the various 
characteristics (e.g., entropy and entanglement) of the 
states on the accuracy of the estimation. 

The produced polarization states of bi-photon were es- 
timated by tomographic measurements [gjl3and max- 
imum likelihood estimation (MLE) fll l2ll l22j. These 
procedures are reviewed in Appendices IA II and IA 21 
In tomographic measurements, the coincidental detec- 
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tion events (within 6ns) on both single-photon detec- 
tors (HAMAMATSU H7421-40) were counted by using 
the time interval analyzer (YOKOGAWA TA-520) dur- 
ing the data acquisition time t at each polarizer's set- 
ting (i.e., projector) \m v ){m u \ (which was determined 
and varied by the half- wave plate (HWP), the quarter- 
wave plate (QWP), and the polarizer (Pol) on each path 
of the produced photons). For investigating ensemble 
size-dependence of the accuracy, we varied the data ac- 
quisition time of each measurement t as 0.2s, 0.5s, 1.0s, 
2.0s, and 5.0s. The typical single counting rate was about 
30000c/s with the dark counting rate of about 300c/s. 
The typical coincidence counting rate was roughly 500c/s 
with the accidental coincidence counts being below 1% of 
the genuine coincidence counts. For eliminating the am- 
bient photons, we used the interference filters (FWHM: 
8nm) (see Ref. 19], for more detailed information). 

B. experimental procedure 

In order to assess the accuracy of the estimation, we 
repeated the measurements and estimation procedures 9 
times for each state and each ensemble size. Here, as 
noted in Appendix IA 21 the density matrix of the two- 
qubit(2 two- level quantum state) can be written as 

Pe Tr[T e T e t] ' 

which satisfies the positivity condition and the trace con- 
dition for density matrices [T3,[2l|; see Appendix IA 21 As 
a result of the 9 identical trials, we had 9 slightly differ- 
ent density matrices {/0^.}f = i- The differences of these 
states might stem not only from the statistical errors but 
also from the experimental systematic ones. For reducing 
the systematic errors, we restricted our data acquisition 
time, t, at each polarizer setting up to t — 5s, so as to 
keep the experimental condition unchanged (especially, 
to keep the pump power constant during whole data ac- 
quisition time t x 16 measurements). 

Then we evaluated the accuracy of the estimation in 
terms of the average fidelity between the true state pq 
and each estimated state {p@.}f = i, i.e., 

1 9 

F(pe ,P§) « gX, ( /?e °> / 'e 4 )> W 

i— 1 

where the fidelity F(p\,p2) is equal to Tr s/pl ] 
[2T. l23T . l24j ]. As the true state pq for each of our concerned 
three states (the VNMS, the APSS, and the HES), we em- 
ployed a state which was estimated by the MLE using the 
whole data acquired for each state. This means that the 
effective data acquisition time for determining the true 
state amounts to t=0.2sx9-trial+0.5sx9-trial+1.0sx9- 
trial+2.0sx9-trial+5.0sx9-trial=78.3s. Later on we use 
these three true states as sources to produce artificial 
16 coincidence-count data for the numerical simulations. 



These simulations are performed without considering any 
systematic errors. Thus we can evaluate to what extent 
the systematic errors affect the total errors. 

C. experimental results 

To visualize fluctuation of the estimation, we use a ma- 
trix, which is explained in detail in Fig. This matrix 
has 12 x 12 elements and is composed of the 9 matrices 
(4x4 matrix), each of which is the real part of the den- 
sity matrix estimated by each trial. Figure shows the 
matrices for three states (i.e., the VNMS, the APSS, and 
the HES) for two different data acquisition time, t — 0.2s 
and t — 5.0s. Here the bases of the density matrices are 
\HH), \HV), \VH) and \VV) (these notations are de- 
fined in Appendix IA 1J) . Entropy (von Neumann entropy 
@) and entanglement (entanglement of formation 0]) of 
each resulting true state, which is obtained by using the 
whole data as mentioned before, is also shown in Fig. [3] 
Note that the each density matrices had little imaginary 
parts in all cases, thus they are not presented. We can 
observe that the fluctuation of each element is reduced 
as the data acquisition time, t, becomes long (from 0.2s 
to 5.0s) in all three states. 

In Fig.QJ the fluctuations of the estimated density ma- 
trices are quantitatively shown in terms of the average 
fidelities between the true state pe and the estimated 
states {pQ.}i = i as a function of the ensemble size. The 
ensemble size corresponds to the nuisance parameter of 
the estimation, A = Xt, where A ~ 500 is coincidence 
counting rate and t is the data acquisition time of each 
measurement (see Appendix IA II for the detailed expla- 
nation) . Each filled plot corresponds to the experimental 
result of the average fidelity, Eq. . To supplement the 
experiments, we also carried out Monte Carlo simulations 
by artificially producing 16 coincidence-count data, {N}, 
according to their true states pe an d to the probability 
mass function given by Eq. (|A10|I . The estimation pro- 
cedures are the same as the experiments except for no 
systematic errors. These simulations were repeated 200 
times, therefore, the blank plots correspond to 

j 200 

i71 (Pe„,Pe)~^7 J E F (^o I Pe,)- ( 2 ) 

i— 1 

Note that the repetition of the simulations (200 times) 
may be large enough to ensure their statistical confi- 
dence. The results of the numerical simulations are in 
good agreement with those of the experiments. Thus the 
systematic errors seem to be negligible for our experimen- 
tal condition (i.e., for the relatively short data-acquisition 
time). Nevertheless, we remark some sources of our sys- 
tematic errors; the fluctuation of the pump power for 
SPDC during the 16 measurements (about 1.0%), the 
uncertainty of the wave-plate's angular setting in the to- 
mographic measurements (about 0.05°), the finite extinc- 
tion ratio of the polarizers (about jgg), and the accidental 
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FIG. 2: To see fluctuation of the estimation, we use a matrix shown in the most right-hand side of (a). This matrix has 12 x 12 
elements and is composed of 9 matrices (4x4 matrix) shown in the left-hand side of (a). Each of the 9 matrices is the real 
part of the density matrix estimated by each trial. The relation between the elements of 12 x 12 matrices and those of the 
4x4 matrices is illustrated in (a). As an example, the 12 x 12 matrix and the constituent 4x4 density matrices for the 
VNMS are shown in (b). The bases of the density matrices are \HH), \HV), \VH) and \VV) (these notations are defined in 
Appendix I A 11 . In this way, fluctuations of estimated density matrices can be visualized. 



coincidence counts (about 1% of the genuine coincidence 
counts). 



III. ACCURACY OF QUANTUM STATE 
ESTIMATION 

Suppose that there are two quantum states, pi, and pi; 
how can the distance between these two quantum states 
be measured? O ne p ossible answer is known as the Bures 
distance [23l . l25l.l26j : 



d B ures{Pl,P2) 2 = 2(1 - TV[y \fp~lP2yfpl]) 

= 2(l-F(p uP2 )), (3) 

where, F{pi, p%) is the fidelity. In Sec. In] we have already 
evaluated the accuracy of the estimation in terms of the 
fidelities between the true state, p@ , an d the estimated 
states, Pq. We will derive the highest accuracy, which 
is, in principle, attainable by our tomographic measure- 
ments in terms of the Bures distance. Then, it is com- 
pared with the experimental results. 

Assuming that the estimated states Pq are in the 
neighborhood of the true state po , the average Bures 
distance between them can be written as pBl l26t l28l 



dBures(pe : Pq) 2 
16 16 

« i E E 4 LD ( Q o) (& i (N) eiw{N) - 4), (4) 

where {#o}l£i = {Oo} are the true parameters charac- 
terizing the true state pe and {^(N)}}^ are their 
estimates inferred from the results of tomographic mea- 
surements {A^} = {n v }]^ =1 ; see Appendix IA II and IA 21 
Here [J/- LI) (6)] = J SLD {Q) is a 16 x 16 matrix given 
by the following manner. First, we define a Hermitian 
operator Lf (&) called symmetric logarithmic derivative 
(SLD) [MlMyllll, by 

^ = i(Lf(e)p e +PeLf(e)). (5) 

The SLD Lf (9) can be obtained by solving the equa- 
tion above and considered as a quantum analogue of the 
score (classically, the score is defined by ln[P(iV|9)] 
as noted in Appendix IA 3|) . Then the matrix, J SLD (Q), 
called symmetric logarithmic derivative Fisher informa- 
tion matrix (SLD Fisher information matrix), is given by 

4 LD (0) = i^[pe(Lf(9)Lf(9)+Lf(9) L f(9))]. 

(6) 
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l > entanglement=0 J 
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FIG. 3: The 12 x 12 matrix (which is composed of 9 density matrices (4 x 4)) for the three states, i.e., the VNMS, the APSS, 
and the HES, for two different data acquisition time, t — 0.2s (left) and t = 5.0s (right). The detailed explanation for each 
element of the matrices is provided in Fig. |5] Entropy (von Neumann entropy) and entanglement (entanglement of formation) 
of each resulting true state (which is obtained by using the whole data as mentioned in the text) is also presented. Note that 
only the real parts of the density matrices are exhibited, because the imaginary parts are quite small compared with the real 
parts. 



From Eq. J2J, we can see that the Bures distance is 
locally equivalent to a distance on a Riemannian manifold 
equipped with a metric structure defined by the SLD 
Fisher information matrix [2I l2rj I2I I2I I29I 13(1 l3l|. 
This recognition furnishes us with a geometrical picture 
of quantum state estimation. 

We note here that the SLD Fisher information matrix 
Eq. © was originally introduced for extending classi- 
cal parameter estimation theory to its quantum coun- 
terpart and formulating the quantum Cramer-Rao type 
lower bound on the errors in estimating quantum states 

m m m m m, m m m m m m m 113. 

Our aim here is to evaluate the best accuracy in esti- 
mating identically prepared quantum states p@ by tomo- 
graphic measurements. In other words, our aim is to find 
out the minimum Bures distance between the true states 
and the estimated states, which can be attained by our 
tomographic measurements. This can be accomplished 
by decreasing the value (^(N)-^^ (N)-^) of Eq. Q 
as much as possible. As is derived in Appendix IA 31 the 
lower bound on the covariance Ee (N) — # ) (N) — 



9q)] can be obtained by the Cramer-Rao inequality 

> .^(Go), (7) 

where Ee [/(-/V)] means averaging over the true prob- 
ability mass function of {A^} given by Eq. IjAlOjl with 
G = Oq] Jij(Qo) is the Fisher information matrix (see 
Appendix I A 3|) . Since the lower bound on the covariance 
is known to be asymptotically achievable by using the 
MLE, the achievable lower bound on the Bures distance 
can be given by 

1 16 16 

d B ures(pe ,Pe) 2 « 4EE4 LD ( e °)^( °) 

t=l 3=1 

> ±Tr{j SLD (e ) J- X (G ]. (8) 

From Eqs. l|A3jl . IjAlOjl . and (|A19jl . the Fisher infor- 
mation matrix (Go) can be rewritten as 

J l3 (Q ) w A4(6o) = fXJiji&o), (9) 
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o almost pure separable state (APSS) 
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nuisance parameter: % t 

FIG. 4: The errors in estimating the three states (the VNMS, 
the APSS, and the HES) in terms of fidelities between the true 
states and the estimated states are shown as a function of the 
nuisanece parameter, A t (A « 500 is coincidence counting rate 
and t is the data acquisition time of each measurement). The 
filled plots represent experimental results (the error bars are 
omitted) and the blank plots represent numerical simulations. 
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FIG. 5: The average Bures distances (divided by 2) between 
the true states and the states estimated by the MLE are 
shown as a function of the nuisanece parameter A t. The 
filled plots represent experimental results (the vertical error 
bars correspond to one standard deviation) and the blank 
plots represent numerical simulations. The inset shows their 
asymptotic lower bounds. 



where 
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Jij(e ) = -Ee [^^-HP(N\Q)]\e=e }, (10) 



l d0 l d0i 



with 



10 



i/=i 

Therefore, 



Tr[\m v )(m v I P&\ 



1 



(11) 



v(e )> j _1 (e ) = T j~ A (e ), (12) 

t A 

that is, in the the asymptotic regime, the errors (the co- 
variance) of the maximum likelihood estimates should be 
decreasing as inversely proportional to the data acquisi- 
tion time t. Consequently, from Eqs. © and JHJl, we have 

2(l-F(pe o ,Pe))>^T Tr [J SiZ5 ( o)r 1 (e„)], (13) 

or, equivalcntly, 

H 1 ~ F (pe , P&)] 

> -ln[A]+ln[iTr[j SLD (e )r 1 (e )]]. (14) 

The logarithm of the average Bures distance between the 
true state and the estimated state is thus supposed to be 
decreasing proportional to the logarithm of the nuisance 
parameter, A = tX (the first term in the right-hand side 
of Eq. id}), and all state-dependent properties appear 
as the intercept on the axis of ordinates (the second term 
in the right-hand side of Eq. Q14))). 



We note that there are some difficulties for practically 
calculating the asymptotic lower bound. First, as can 
be seen in Eq. I|A18(1 , the calculation of the Fisher infor- 
mation matrix Jy(On) in Eq. JHJl includes 16 infinite sum 
for computing an average over the probability mass func- 
tion (|A10f) . For circumventing this difficulty, we make an 
approximation; 



J«(e ) 



i 



1000 



d 



a 



1000 



E^Wiejl^MPWie))], 



(15) 

where {Nq^} is numerically simulated (1000 times) ac- 
cording to the Gaussian approximation of P(N\Qq) in 
Eq. l|A"T0|) . Second, the inverse of the Fisher information 
matrix (|15fl should be derived from the so-called Moore- 
Penrose generalized inverse |4l| in case the determi- 
nant of the Fisher information matrix becomes zero (the 
Moore-Penrose generalized inverse provides the unique 
and well-behaved inverse even for such degenerate matri- 
ces). Third, the SLD hf{&) of Eq. © is not uniquely de- 
termined from Eq. (J3J), except for non-degenerate states 
(the states whose eigenvalues are all non-zero values, that 
is, the strictly positive states, or the rank-4 states for 
our specific example). This problem was investigated by 
Fujiwara, Nagaoka |2g, Hayashi J37|, and Matsumoto 
|40j for pure states, and Fujiwara |2{|, Matsumoto [30| . 
and Fujiwara, Nagaoka [3^] for more general degenerate 
states. According to their results, any SLDs derived from 
Eq. JSJ) results in the same SLD Fisher information ma- 
trix J SLD (8n). For this reason, we solved Eq. (0 by 
using the Moore-Penrose generalized inverse, which pro- 
vides an unique solution for Lf (6) and regarded it as a 
representative of the SLDs. 
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To see whether the theoretical predictions, Eq. (|14|l . 
arc truly observed in our experiments, the average Bu- 
res distances between the true states and the estimated 
states as a function of ensemble size, Xt, (i.e., the nui- 
sance parameter) are presented in Fig. [5] where the both 
axes of Fig. 01 are converted into their logarithms. The 
calculated asymptotic lower bounds of the average Bures 
distances are also shown in the inset of Fig. [5] The slight 
deviation from the linear slope in the small nuisance pa- 
rameter regime in the inset of Fig. [S] is probably due to 
the mismatch between the Gaussian approximation used 
for producing simulated data {Nq^} and their genuine 
distribution, i.e., Poisson distribution. 

In Fig. [SI the average Bures distances clearly depend on 
what kind of states are to be estimated. However, there 
are discrepancies between the asymptotic lower bounds 
(the inset of Fig. JSJl) and the experimental results. The 
discrepancies in the small ensemble region (the left-hand 
side of Fig. (0)) might be explained by higher order effect 
of the errors [42] (i.e., by the deviations from the first 
order approximation of the Bures distance in Eq. (@J). 
On the other hand, the discrepancies in the large en- 
semble region (the right-hand side of Fig. ©) cannot be 
explained by such higher order effects. Except for the 
results of the VNMS, the expected asymptotic behavior 
(i.e., the inverse proportionality to the ensemble size) is 
not observed, even in the results of the numerical simu- 
lations. On the other hand, the simulations were carried 
out including no systematic errors. These facts negate a 
possibility that the cause of the above discrepancies could 
stem from perturbations of the experimental condition. 
Another possibility is that the true state may be slightly 
biased due to the fact that we determined it by the es- 
timation, thus the (possibly biased) true state plays a 
major role in the discrepancy. However, for the numeri- 
cal simulations, this true state is a bona-fide state, which 
is used as the source to produce the artificial-coincidence- 
count. This fact nullifies the latter possibility, too. 

In the next section, we will elaborate on a possible 
reason for the discrepancies, and introduce a new estima- 
tion strategy based on the Akaike 's information criterion 
|20j | for reducing the discrepancies and approaching the 
asymptotic lower bound, Eq. (|14() . 



IV. AKAIKE'S INFORMATION CRITERION 

Remember that we implicitly assumed that the para- 
metric model of the quantum states (given by Eq. l|A7|l 
and (|A8|0 is full-rank, that is, we parametrized the quan- 
tum states with 16 parameters (including the nuisance 
parameter); see Appendix IA II and IA 21 However de- 
generate states, such as the APSS or the HES, might 
be completely characterized by less than 16 parameters. 
Subsequently, the surplus parameters give rise to an am- 
biguity in the numerical procedure of the MLE, i.e., in 
finding the maximum of the likelihood function, <A10|) . 
These procedures were executed by FindMinimum, a func- 



tion of MATHEMATICA 4.0, which is employing the 
multi-dimensional Powell algorithm, as in Ref. |l7|. How- 
ever, the minimum found by this function is not neces- 
sarily the global minimum. Note that there are some 
ways to circumvent the problem of such local minimums 
of likelihood function, e.g., by using the quasi-Newton 
methods or employing a sophisticated iterative proce- 
dures, the expectation-maximization (EM) algorithm fol- 
lowed by unitary transformation, which is due to Rehacek 
et al. [22j . Here we will, however, give a rather simple 
but thought-provoking procedure based on the so-called 
Akaike's information criterion (AIC) po| . which elimi- 
nates the redundant parameters. 
The AIC is defined by 

AIC {k) {&) = -2 ln[pW(A|9)] + 2k, (16) 

where k is the number of independent parameters and 
ln[P( fe ) (A|9)] is the log-likelihood function for the quan- 

(k) m 

turn state Pq which parametrized by k parameters 1 131 . 
When there are several hypothetical models (with dif- 
ferent number of parameters) for estimating a certain 
state, the model which attains the smallest AIC can be 
regarded as the most appropriate model because of the 
following justification. In Appendix IA 21 for explaining 
the MLE, we used the fact that the approximation (|A12|) 
is valid in the asymptotic region. What Akaike found 
|20| is that there is a difference between the mean of 
the maximum log-likelihood function (right-hand side of 
(HE))) and the maximum log-likelihood function derived 
by the obtained data (left-hand side of i|A12(l L and the 
difference can be approximately given by 

A (fc) (©) = ^ln[P (fe) (A|e)]-Ee„[ln[P (fe) (A|e)]| t=1 ] 



Taking this correction into account, the Kullback-Leibler 
distance between the true probability mass function 
Po{N) and its parametric model P(N\Q), i.e., Eq. l|A13jl . 
can be minimized by reducing the value, 

-Ee [ln[P (fc) (A|e)]| f=1 ] = -\ ln[P« (N\G)] + \ 

= ^AJCW(e), (18) 

with respect to the estimators {9}. Therefore, if we 
choose the model which minimizes the AIC (|16H among 
several alternative parametric models, it is ensured that 
this model is the closest to the true one from the view- 
point of the Kullback-Leibler distance. The resultant es- 
timate is called minimum AIC estimate (MAICE) [H. 
When a maximum likelihood estimates of a certain model 
is almost identical to that of another model, the MAICE 
becomes the one defined with the smaller number of 
the parameters. The definition of the MAICE gives the 
mathematical formulation of the principle of parsimony 
in model selection. 



The importance of this new strategy might be more 
noticeable in estimating quantum states in the infinite- 
dimensional Hilbert space, e.g., in estimating Wigner 
function [4 or density matrix in the number state repre- 
sentation |5[. In this situation, somehow vague Fourier- 
frequency cutoff or truncation of an infinite-dimensional 
density matrix to finite-dimensional density matrix is in- 
troduced in executing the inverse Radon transformation 
or quantum-state sampling, respectively . We note that 
Gill and Guta, recently, made a first attempt addressing 
this issue 0]. 

Specifically, for estimating the two qubits, we use 



(fc) 
Pe 



r4 fc) c4 fc y 

Tr[T4 fc) C4 fc) )t] 



(19) 



where 
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(20) 



thus, Pq 6 ^ , Pq 5 ^ , Pq 2 ^ , and p^Q are representing the rank- 
4, rank-3, rank-2, and rank-1 density matrices, respec- 
tively. Then the AICs are respectively given by 

APC (16) (9) = -2 ln[P (16) (A|9)] +2 x 16, 

APC (15) (9) = -2 ln[P (15) (A|9)] + 2 x 15, 

APJ (12) (9) = -2 ln[P (12) (A|9)] +2 x 12, 

AIC (7) {e) = -2 ln[P (7) (A|9)] + 2 x 7, (21) 

where P^(N\Q) is the same form of Eq. (|A10I) but re- 
placing M„(9) with 



M«(6) = Tr[\m v )(m v \T^(Ti k) ^ 



(22) 



Among these models, we can choose the one which min- 
imizes the AIC. As an example, for one of the typical 
experimental data of coincidence counts for the VNMS 
(data acquisition time: 5s), {A}={615, 553, 613, 605, 
550, 576, 596, 609, 575, 622, 577, 601, 574, 569, 591, 
569}, we have the following AICs; AIC^ = 163.4, 
APJ( 15 ) = 201.3, AIC^ = 349.9 and AIC^ = 2899.3. 

On the other 



Therefore we choose the rank-4 model 
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FIG. 6: The average Bures distances (divided by 2) between 
the true states and the MAICEs of the states are shown as a 
function of the nuisanece parameter Xt. The filled plots rep- 
resent experimental results (the vertical error bars correspond 
to one standard deviation) and the blank plots represent nu- 
merical simulations. The inset shows their asymptotic lower 
bounds. 



hand, for one of the typical data for the APSS (data ac- 
quisition time: 5s), {N}={42, 45, 25, 2504, 60, 56, 31, 
33, 1309, 1431, 1148, 1125, 514, 487, 576, 599}, we have; 
AIC^ = 152.8, APC< 15 ) = 150.8, AIC^ = 146.3 and 
AIC^ = 208.9. Thus the rank-2 model p£ 2) is chosen. 

It is possible to think about the other hypothetical 
models, e.g., separable model (which has 7 parameters), 
or separable and also rank-1 model (which has 5 param- 
eters), but for simplicity, our analyses were confined to 
the above 4 models. 

Figure |S] shows the average Bures distances between 
the true states and the estimated states obtained by em- 
ploying the new estimation strategy. Their asymptotic 
lower bounds are also exhibited in the inset of Fig. [5] 
These asymptotic values were calculated according to 
Eq. ©. Note that since the true state here was also 
determined by the MAICE instead of MLE, true state 
for the APSS and that for the HES resulted in rank-2 
density matrices. 

As a result of the reduction of the parameters, the 
MAICEs substantially reduces the discrepancies between 
the asymptotic lower bounds (the inset) and the exper- 
imental results (filled plots: experiments, blank plots: 
Monte Carlo simulations) comparing to the previous re- 
sults (Fig. |5j). Moreover, in the region where the data 
acquisition time t greater than 2s, i.e., X = tX > 10 3 , 
the lower bounds of Eq. i|13fl are almost achieved. This 
is the case even for estimating degenerate states such as 
the APSS and the HES. Note also that the intercepts of 
the asymptotic values on the axis of ordinates shown in 
the inset of Fig. are slightly lowered comparing to the 
previous ones (the inset of Fig. [SJ . 

Here we remark that while the numerical simulations 



9 



continue up to A = 100000, the maximum data acquisi- 
tion time of each experiment is 5s, which corresponds 
to A w 2500. This is because the systematic errors 
mentioned above might be getting significant around 
A « 10000. 

The decreasing rate of the Bures distance for the APSS 
deviates slightly from the ideal value, -1. This might be 
due to the residue of the redundant parameters even af- 
ter making model-inquiry among the above 4 models, 
because for APSS, the another model (e.g., separable 
model or separable and rank-1 model) might be more 
suitable. Thus the further reduction of the parameters 
might be possible. The discrepancies in small ensemble 
region (left-hand side of Fig. B ) m ay be explained by 
higher order effect of the errors |43 as is mentioned in 
the previous section. 

What kind of factor does dominantly affect the ac- 
curacy of the estimation? This question has not been 
perfectly answered so far. In the general settin g o f 
the quantum parameter estimation problem [sll, I34I l36j | , 
any kinds of measurements represented as the positive 
operator-valued measures (POVMs) |2| are allowed to be 
utilized. Then not only inseparable projective measure- 
ments on the two qubits but even collective measure- 
ments on whole ensembles are allowed. In this setting, 
it has been known that the non-commutativity of quan- 
tum mechanics has significant influence on the attain- 
able lower bounds on errors in estimating quantum states 
with multiple-parameter [30l l3Sj. A lthou g h si gnif icant 
progress has been made S LM LIl ua LM tHj] , 
finding the asymptotically optimal measurement strategy 
and obtaining the achievable lower bounds on the errors 
in estimating quantum states with multiple-parameter 
are still important open problems. 

On the other hand, in our setting, the measurement 
strategy we employed is not such a optimal collective- 
measurement strategy, but local tomographic measure- 
ments represented by (|A5(I . Nonetheless, our results re- 
veal another aspects of the quantum state estimation, 
that is, the nature of local measurements. Figure[|)]shows 
that the errors in estimating the entangled state, i.e., the 
HES (which has small entropy but large entanglement; 
see Fig. [3J is the largest among the three states in the 
asymptotic region. Thus, the existence of entanglement 
seems to degrade the accuracy of the estimation if the 
measurements are performed locally. 



V. ALTERNATIVE MEASUREMENT 
STRATEGY 

In this section, we discuss the alternative measurement 
strategy for two qubits. Since it may be extremely dif- 
ficult to experimentally realize optimal collective mea- 
surements, the following discussion is restricted to pro- 
jective measurements on just one sample in the ensem- 
ble, i.e., on two qubits. Note that there is another fa- 
vorable measurement strategy, that is, self-learning mea- 



surements |45j,|46J. However, to the best of our knowl- 
edge, the lower bound on errors in estimating with this 
type of adaptive-measurement strategy is still missing. 

It is reasonable to expect that if we employ measure- 
ments on the inseparable projectors on two qubits, the 
errors in estimating the entangled states might be re- 
duced. For inspecting whether this expectation is true 
or not, the following specific projective measurements: 

U\HH) + \VV)){{HH\ + {VV\) 
U\HH)-\VV))((HH\-(VV\) 
U\HV) + \VH))((HV\ + (VH\) 
U\HV)-\VH))((HV\-(VH\) 
U\HD) + \VX))((HD\ + (VX\) 
U\HD)-\VX))((HD\-(VX\) 
l(\HX) + \VD))((HX\+(VD\) 
U\HR) + \VL))((HR\ + (VL\) , 
U\HR)-\VL))((HR\-(VL\) [Z6 > 
l(\HL) + \VR))((HL\ + (VR\) 
\H){H\ ® \1 
\I®\H)(H\ 
\D)(D\ <g> \1 
\I® \D)(D\ 
\R)(R\ ® \I 
\I®\R)(R\, 

are employed as an alternative to the local tomographic 
measurements IJA5I) . Here, as mentioned in Appendix 
O \D) = ±(\H) + |V0), \X) = ±(\H) - \V)), 

\R) = j-QH) + i\V)), and \R) = ^(\H) - i\V)), as 
\H) and \V) being the horizontal polarization state and 
vertical one, respectively. This set of 16 projective mea- 
surements includes 10 inseparable projectors, and satis- 
fies the condition of tomographic measurements, which 
is presented in Appendix I A II 

These projective measurements can be realized by 
slightly modifying the interferometric Bell-state analyzer 
|47| . Figure shows the proposed experimental setup 
for realizing the projective measurements l|23l) . For 10 
inseparable-projective measurements in (|23p. the down- 
converted photons are coupled into single mode opti- 
cal fibers (SMFs) and mixed at 50/50 coupler. Then 
if the optical path length of two photons are appro- 
priately adjusted and the effects of birefringence in the 
SMFs are compensated by fiber polarization controller, 
only two photon whose state of polarization belongs to 
anti-symmetric subspace, (singlet subspace) contribute 
the coincidence counts of photon detector (PD) A and 
B. This coincidence measurement is equivalent to one 
of the inseparable projective measurements, | {\HV) — 
\VH))({HV\ - (VH\). The other 9 inseparable mea- 
surements can be straightforwardly realized by the local 
unitary transformation of the state of polarization using 
half-wave plates (HWP) and quarter-wave plates (QWP) 
before coupling photons into the SMFs. 

For remained 6 local-projective measurements in (J22J, 
the state of one-photon polarization is projected on the 
particular state, e.g., \H)(H\, by inserting a mirror into 
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FIG. 7: Outline of the proposed experimental setup for real- 
izing the projenctive measurements (1231 . See text for details. 



the path B (path A) and using HWP, QWP, polarizer, 
and PD B2 (PD A2 (not shown)) as shown in the dotted 
box in Fig. [7| Another photon is propagated to either PD 
A or B. The coincidence measurements of PD B2 (PD 
A2) and either PD A or PD B are served as the local- 
projective measurements, e.g., \l ® \H)(H\ (\H)(H\ <g> 

in- 

For utilizing this strategy as an alternative to the lo- 
cal one (|A5|I , it is vital to minimize the systematic errors 
due to imperfect intensity interference in the inseparable 
measurements. In order to achieve required high visibil- 
ity of interference, the distinguishability in any degree 
of freedom of two photons other than the polarization 
should be reduced. By using SMFs for enhancing spacial- 
mode overlap of two photons, we expect that such sys- 
tematic errors due to the spacial degree of freedom might 
be reduced to some extent. In the recent experiments, 
the visibilities of this interference exceeding 98% anci 
even reaching 99.4% were reported. 

The comparison between the asymptotic lower bounds 
Eq. I|14|) for the above inseparable measurements (I23[l and 
the conventional local ones <|A5ll is presented in Fig. [HJ 
As expected, the improvement of the accuracy in esti- 
mating the HES can be found in Fig. [21 (c), although the 
accuracy is decreased in estimating the APSS as can be 
seen in Fig. [SI (b). As indicated in Fig.[HJ(a), even for the 
VNMS, which has no entanglement at all (see Fig.[3J), the 
inseparable measurements (I23[) are working better than 
the separable ones (|A5|) . This rather surprising result 
might be viewed as the non- locality without entanglement 
in quantum state estimation Q, 133, • We conjecture 
that for the mixed states like the VNMS, no local to- 
mographic measurements can attain the same accuracy 
achieved by the inseparable measurements presented in 
(|23|l . This phenomenon may stem from the fact that the 
mixed states can be represented as the classical mixture 
of the entangled states as well as that of the product 
states. 



VI. CONCLUSION 

We presented quantitative analysis concerning the ac- 
curacy of the quantum state estimation, and demon- 
strated that they depend both on the states to be es- 
timated and on the measurement strategies. For this 
purpose, the SPDC process was employed for experimen- 
tally preparing various ensembles of the bi-photon polar- 
ization states and the AIC based new estimation strat- 
egy, i.e., the MAICE was introduced for eliminating the 
numerical problems in the estimation procedures. Our 
results showed errors of the estimated density matrices 
decreased as inversely proportional to the ensemble size 
for all of the three states we examined (the VNMS, the 
APSS, and the HES) in the asymptotic region. Besides, it 
was revealed that the existence of entanglement degrade 
the accuracy of the estimation when the measurements 
were performed locally on two qubits. The performance 
of the alternative measurement strategy, which included 
the projective measurements on inseparable bases, was 
numerically examined, and we found that the insepara- 
ble measurements improved the accuracy in estimating 
the VNMS as well as the HES. 

Further study of the quantum state estimation is sure 
to pave the way for understanding the ultimate rule for 
acquiring information from quantum systems. 
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APPENDIX A 

In Apps. I A II and I A 21 we give a brief review of tomo- 
graphic measurements and maximum likelihood estima- 
tion (MLE), respectively, in accordance with Ref. [l7| . 
Readers who are familiar with two issues can skip these 
two Apps. We mention the Cramer- Rao inequality and 
the Fisher information matrix for providing the optimal- 
ity of the MLE in Apps.lA~3l 



1. tomographic measurements 

With the standard Pauli matrices {<7j}? =1 supple- 
mented with the identity matrix oo = an arbi- 
trary density matrix of two qubits can be represented in 
Hilbert-Schmidt space as a parametric statistical model: 
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(VNMS) 
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FIG. 8: The comparison between the asymptotic lower bounds Eq. 1141 for inseparable tomographic measurements 12311 and 
for the conventional local ones ljA5jl . 
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ti=0 



where T M+j = \ (a^aj) and ^ = Here {^}^ 5 =0 
are assumed to be real. From the trace condition of a 
density matrix, 0° is equal to one. Note that the above 
parametric model in Hilbert-Schmidt space does not en- 
sure positivity condition of a density matrix, the problem 
of positivity is revisited in I A 21 

When we try to estimate quantum states as the para- 
metric statistical model of Eq. IjAljl . we should perform 
some kinds of measurements. Suppose that the measure- 
ments are represented by projectors |m„)(m„|. Imag- 
ining coincidence counting measurements on bi-photon 
polarization states as a concrete example, the projectors 
correspond to a certain polarization states. After carry- 
ing out the measurements for data acquisition time t, the 
results are given by 



where 



i v = XTr[\m v )(m v \p 9 ], 



X = Xt. 



(A2) 



(A3) 



is the coincidence counts without polarizers for the data 
acquisition time t. Then, A is the coincidence counting 
rate. Although our attention is focused on the 15 param- 
eters {(fr^y^Lo, X is also a priori unknown. Therefore, A 
is appended to the list of the parameters for estimating 
the states. The parameter A is thus called the nuisance 
parameter. Using Eq. I jAlj l. Eq. ljA2jl becomes 



15 



15 

A £ B u ^ 

fi=0 



(A4) 



Eq. (jA4j) provides a linear relationship between the 16 
parameters {(fi^zLi and A, and the measurement results 



{n„}. Subsequently, we can derive a necessary and suffi- 
cient condition of the measurement for determining these 
parameters, that is, the matrix B u ^ has an inverse (thus 
the measurement should consist of at least 16 projectors). 
Measurements that satisfy the above condition are called 
tomographic measurements |17|. 

A specific instance of tomographic measurements 
{|m u )(m u |}i 6 =1 are H^: 

\HH)(HH\ \HV)(HV\ \HD)(HD\ \HL)(HL\ 
\VH)(VH\ \VV)(VV\ \VD)(VD\ \VL)(VL\ 



\DH)(DH\ \DV)(DV\ \DD)(DD\ \DL)(DL\ 
\RH)(RH\ \RV)(RV\ \RD)(RD\ \RL)(RL\, 



(A5) 



where \D) = j-(\H) + \V)), \X) = -^(\H)-\V)), \R) = 
^QH)+i\V)), and \R) = -\=(\H)-i\V)), as \H) and \V) 
being the horizontal polarization state and vertical one, 
respectively. Here \HH)(HH\ means (\H) ® \H))((H\ ® 
(H\)- 

From the measurement results {n^}l, ( L 1 = {A}, we 
can solve linear equation, Eq. ljA4jl . with respect to the 
parameters, {<^}uLi an d X. As a result, the quantum 
state of the form Eq. IjAljl can be uniquely reconstructed. 
The solutions are explicitly expressed as 



1 16 



(A6) 



This estimation strategy is called the linear tomography 

E3. 



2. Maximum likelihood estimation 

The flaw of the linear tomography in Sec. IA II is two- 
fold. One is that there are no considerations about its 
optimality, another is that the parametric model for lin- 
ear tomography, Eq. IjAljl , does not ensure the positivity 
condition of the density matrix as mentioned before. The 
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solution for these flaws is to use maximum likelihood es- 
timation (mle) mum. 

Density matrix, which satisfy the positive condition 
and also the trace condition, can be written as 



Pe 



Tr[T e T e t] 



(A7) 



where Tq is assumed to be a normal matrix. Then, 
following Ref. 0, |2l|, we adopt the complex lower 
triangular matrix parametrized by 16 real parameters, 
= {©}, (Cholesky decomposition) as the nor- 
mal matrix Tq . It is explicitly written as 
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Tl5 /)16 



(A8) 



We should keep in mind that while the number of the 
parameters for the complex lower triangular matrix (|A8|I 
is 16, that of the density matrix (|A7|) is effectively 15, 
because of the denominator Tr [TqTq^]. 

The coincidence counts {N} — {n u }]^ =1 are assumed 
to obey the Poisson distribution with the mean being 



M„(6) = XTr[\m v )(m u \pe] 

= Tr[\m u )(m„\TeT @ i], 



(A9) 



where we rearrange the parameters in Eq. I|A8J) so that 
the value of T¥[TeTe^] coincides with the nuisance pa- 
rameter A of Eq. (|A3|I . Thus the probability mass func- 
tion (Poisson density function) of the measurement re- 
sults {N} for given values of the parameters {0} is writ- 
ten as 
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p(n\g) = n 



16 



(A10) 

Although the parametric model, Eq. (| AT|> . guaran- 
tees the positivity and trace condition, the simple linear 
relationship between the results of measurements {N} 
and the parameters {0} like Eq. (|A4|) has disappeared. 
Nonetheless, the MLE can be applied for inferring the pa- 
rameters {6} from the observed results {n u }]^ =l = {N} 
[5l| . We can regard Eq. IjAlOf) as a function on the 
16-dimensional parameter space where each point cor- 
responds to a certain quantum state. It is called like- 
lihood function. Then, it is reasonable to consider that 
the point (state) which maximizes the likelihood func- 
tion (| A10|) is likely to be the nearest to the true point 
(state), {91, 6%, . . . , 9^} = {Go}. The strategy to choose 
the values ^(N), e 2 (N), e" 16 (N)} which maximizes 
Eq. IjAlOfl as the estimates is called maximum likelihood 
estimation (MLE). 

The MLE is elucidated based on the Kullback-Leibler 
distance (relative entropy) [2I l35l l52|. It is often conve- 
nient to consider the natural logarithm of the likelihood 



function P(N\Q), which is called log-likelihood function: 
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hx[P(N\Q)] = J2hi\p(n v \e)}. (All) 



Here it is apparent that this change does not influence 
the location of the maximum. As the data acquisition 
time t is increased infinitely, the log-likelihood function 
divided by t tends, with probability 1, to the the mean 
log-likelihood function for unit time, t = 1, i.e., 



00 00 



- t \n[P(N\Q)} « ■ ■ -J2 P o( N )HP(N\0)}\t=i 

ni—0 712—0 7116—0 

^Ee HP(iV|e)]| t=1 ], (A12) 

where Pq{N) = P(N\®o) is the true probability mass 
function of {N}. The difference between the true prob- 
ability mass function Pq(N) and the parametric model 
P(N\Q) can be measured by the Kullback-Leibler dis- 
tance Sim El, 



D(P (N) || P(N\Q)) = Ee [ln[Po(AO] - ln[P(iV|6)]]. 

(A13) 

This takes a positive value, unless Po{N) — P(N\Q) in 
all {N} (in this case D{P (N) || P(N\0)) = 0). Then it 
becomes clear that what we try to do by the MLE (i.e., 
to increase the log-likelihood function, Eq. (|A12|I . with 
respect to {&}) is to minimize the Kullback-Leibler dis- 
tance between the true probability mass function Pq(N) 
and its parametric model P(A^|0). 



3. Cramer- Rao bound and 

Fisher information matrix 

The MLE is supposed to be the optimal estimation 
strategy in the following sense. The errors of the es- 
timates {^(^)>6> 2 (^)>---,# i6 (A0} = {@(N)} can be 
represented by the covariance matrix V(Oo) = [X y (@o)] 
which is given by 

V*(e ) (A14) 
= Ee„ W (N) ~ Ee [<*(#)]) (N) -Ee P (N)})}. 

The Cramer-Rao inequality provides an asymptotic lower 
bound on the covariance matrix V(6o) as follows [ssllssl 
I52L We first assume the unbiasedness of the estimates 



9*(N), i.e., 



Ee - 00*)] = 0- 



(A15) 



Then we define the score of the probability mass function 
as 



Si(N\e ) 



_d_ 



ln[P(JV|e)]| e =e 



AP(N\Q)\ e =e 



P(N\e ) 



(A16) 
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Here, we can readily verify the mean of the score is zero, 
i.e., 

Ee o DSi(JV|e o )]=0. (A17) 
Thus, the covariance of the score can be written as 

Jy(e ) = Ee [5i(iV|eo)5 i (iV|eo)]. (A18) 
Equivalently, we have 

d 2 



Jij{®o) = -Ee 



89*883 



ln[P(iV|e)]|e=e ], (A19) 



as can be seen from differentiating Eq. I|A17|I with re- 
spect to Qi . The covariance matrix of the score, J (Go) = 
[Jij (So)]j is well known as the Fisher information matrix. 
By the Schwarz inequality for expectation, we have 



in 



in 



(Ee„E *iSi(N\e ) E Vj(O j (N) ~ 9*)]? 

i=l 3 = 1 

16 16 16 16 

< EE ^^(0o)]E E wy^(e )].(A20) 



i=l 3 = 1 



i=l 3 = 1 



where, we introduce two sets of 16 auxiliary real variables 
(2/1,3 /2, ■ ■ ■ ,Vx&) = s y and (zi, Z2, . . . , zie) = t z. From 
Eq. (|A17f) . we have 

Ee [^(A|e ) 0>(N)-0$)] 
= Ee [Si(N\eo)O j (N)] 

= EE- E p ( N \ Q o) ^ P l N l e l l T &0 e j (N) 



m— 0ri2— 7116—O 

00 00 00 



P(JV|0 o ) 



™iE E- Ewwi 



00 

si 



e=e 







ni— U2= ni6— 



(A21) 



where is the Kronecker's delta. Consequently, the left- 
hand side of the inequality l|A20(l becomes 



16 16 



EE 2 ^) 2 = ( tyz ) 



(A22) 



By substituting Eq. 1A22|) and putting z = j(6 ) 1 y in 
the Schwarz inequality l|A20|) . we obtain 



(A23) 



that is, 



t yj(e )- 1 y < 4 yv(e )y, 



V(6 ) > J-^Go), 



(A24) 



which is the Cramer-Rao inequality for unbiased esti- 
mates. Note that most estimators used in practice are 
not unbiased. However, the Cramer-Rao bound on the 
variance of an unbiased estimator is asymptotically also 
a bound on the mean square error, 

V ij (Q ) = Ee [(*W - - 9l)). (A25) 



of any well-behaved estimator, as shown by Gill and Mas- 
sar in Ref. [39j |. Thus, the Cramer- Rao inequality pro- 
vides us with an asymptotic lower bound on the covari- 
ance matrix V(0o) for wide variety of estimates in terms 
of the Fisher information matrix [33113^1 13H IH^ . Here, we 
mention the significant fact that the maximum likelihood 
estimates are asymptotically efficient, in other words, by 
the MLE, the covariance matrix asymptotically achieves 
the Cramer-Rao lower bound j3^,|H3. In this sense, the 
MLE is the optimal strategy. 
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